|
SNV calling from NGS data refers to a range of methods for identifying the existence of single nucleotide variants (SNVs) from the results of next generation sequencing (NGS) experiments. These are computational techniques, and are in contrast to special experimental methods based on known population-wide single nucleotide polymorphisms (see SNP genotyping). Due to the increasing abundance of NGS data, these techniques are becoming increasingly popular for performing SNP genotyping, with a wide variety of algorithms designed for specific experimental designs and applications. In addition to the usual application domain of SNP genotyping, these techniques have been successfully adapted to identify rare SNPs within a population, as well as detecting somatic SNVs within an individual using multiple tissue samples. == Methods for detecting germline variants == Most NGS based methods for SNV detection are designed to detect germline variations in the individual's genome. These are the mutations that an individual biologically inherits from their parents, and are the usual type of variants searched for when performing such analysis (except for certain specific applications where somatic mutations are sought). Very often, the searched for variants occur with some (possibly rare) frequency, throughout the population, in which case they may be referred to as single nucleotide polymorphisms (SNPs). Technically the term SNP only refers to these kinds of variations, however in practice they are often used synonymously with SNV in the literature on variant calling. In addition, since the detection of germline SNVs requires determining the individual's genotype at each locus, the phrase "SNP genotyping" may also be used to refer to this process. However this phrase may also refer to wet-lab experimental procedures for classifying genotypes at a set of known SNP locations. The usual process of such techniques are based around:〔 # Filtering the set of NGS reads to remove sources of error/bias # Aligning the reads to a reference genome # Using an algorithm, either based on a statistical model or some heuristics, to predict the likelihood of variation at each locus, based on the quality scores and allele counts of the aligned reads at that locus # Filtering the predicted results, often based on metrics relevant to the application # SNP annotation to predict the functional effect of each variation. The usual output of these procedures is a VCF file. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「SNV calling from NGS data」の詳細全文を読む スポンサード リンク
|